SIVOCS Hypothesis Testing

2021-11-26

Example 1: A1 | Problem: Explaining a 3. unknown variable with the relation between 2 variables

Hypothesis: Transdisciplinarity is usually an important cornerstone for social innovation, although it is no condition for it. Moreover, not all transdisciplinary research automatically contributes to social innovation. Nevertheless, we assume that a higher experience in transdisciplinary research might point to a higher propensity for social innovation.

A1, E1 correlation Matrix:

A1, D1.2, D1.3, D2 correlation Matrix

A1: Factor analysis

There are 2 possible approaches for factor analysis. Either with including all possible variables or analysing each correlation in its own factor analysis but I am including all of those here.

Find the optimal number of factors:

## 
## Call:
## factanal(x = na.omit(a1.df), factors = 3, rotation = "varimax")
## 
## Uniquenesses:
## transdisciplinaryExp.rate.        groupsInvolved.res. 
##                      0.626                      0.528 
##       groupsInvolved.busi.     groupsInvolved.civsoc. 
##                      0.899                      0.500 
##     groupsInvolved.policy.      groupsInvolved.citiz. 
##                      0.623                      0.539 
##      groupsInvolved.media.    groupsInvolved.welfare. 
##                      0.804                      0.698 
##           motivation.prob.        motivation.welfare. 
##                      0.861                      0.086 
##       benefitForNonAcademy 
##                      0.505 
## 
## Loadings:
##                            Factor1 Factor2 Factor3
## transdisciplinaryExp.rate. 0.148   0.215   0.554  
## groupsInvolved.res.        0.114           0.677  
## groupsInvolved.busi.       0.168   0.161   0.217  
## groupsInvolved.civsoc.     0.685   0.117   0.131  
## groupsInvolved.policy.     0.594   0.149          
## groupsInvolved.citiz.      0.587           0.334  
## groupsInvolved.media.      0.315           0.309  
## groupsInvolved.welfare.    0.463   0.227   0.189  
## motivation.prob.                   0.371          
## motivation.welfare.        0.193   0.920   0.174  
## benefitForNonAcademy       0.296   0.572   0.285  
## 
##                Factor1 Factor2 Factor3
## SS loadings      1.670   1.476   1.186
## Proportion Var   0.152   0.134   0.108
## Cumulative Var   0.152   0.286   0.394
## 
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 65.42 on 25 degrees of freedom.
## The p-value is 1.8e-05

A1: Var. combination

According to the correlation matrix and factor analysis, following in E1 can be combined:

a1.df$civsoc.policy.citiz.welfare <- (a1.df$groupsInvolved.civsoc. + a1.df$groupsInvolved.policy. + a1.df$groupsInvolved.welfare. + a1.df$groupsInvolved.citiz. )/4 

Another approach could be taking the whole variable as stakeholder engagement (other than researchers)

a1.df$e1_diff_res <- (a1.df$groupsInvolved.civsoc. + a1.df$groupsInvolved.policy. + a1.df$groupsInvolved.welfare. + a1.df$groupsInvolved.citiz. + a1.df$groupsInvolved.busi. + a1.df$groupsInvolved.media.)/6 

Also from the D questions, following variables show strong correlation

a1.df$welfare.nonAcademy <- (a1.df$groupsInvolved.welfare. + a1.df$benefitForNonAcademy)/2

Correlation after combination:

pairs.panels(a1.df[, c("transdisciplinaryExp.rate.", 
                       "civsoc.policy.citiz.welfare", 
                       "groupsInvolved.busi.",
                       "groupsInvolved.media.",
                       "welfare.nonAcademy",
                       "motivation.prob."
                       )], 
             method = "pearson", # correlation method
             hist.col = "#00AFBB",
             density = TRUE,  # show density plots
             ellipses = TRUE, # show correlation ellipses
             lm = TRUE
             )

A1: Linear Model

Explaining stakeholder involvement with the transdisciplinary experience does not make much sense. Transdisciplinary research should involve stakeholder engagement anyways. We can only compare the effect of the motivation variables with the effect of transdisciplinarity

  civsoc.policy.citiz.welfare
Predictors Estimates CI p
(Intercept) -0.07 -0.20 – 0.05 0.255
motivation prob -0.00 -0.01 – 0.01 0.912
motivation welfare 0.04 0.03 – 0.05 <0.001
transdisciplinaryExp rate 0.03 0.01 – 0.04 0.001
Observations 330
R2 / R2 adjusted 0.175 / 0.167

A linear model with the whole e1 combination:

  e1_diff_res
Predictors Estimates CI p
(Intercept) -0.02 -0.13 – 0.08 0.698
motivation prob 0.00 -0.01 – 0.01 0.973
motivation welfare 0.03 0.02 – 0.05 <0.001
transdisciplinaryExp rate 0.02 0.01 – 0.04 <0.001
Observations 327
R2 / R2 adjusted 0.186 / 0.179

Example 2: C1 | Problem: Too many independent variables with multicollinearity -> Unstable models

Hypothesis: The self-assessment is an important variable to identify the contribution to social innovation. The higher the self-assessment is the higher is probably the real contribution to social innovation. We assume, however, a trend towards under estimation. Therefore, it is important to confront the self-assessment with other indicators.

Inner corelations in E1

corr_matrix_plt(e1.df[, -1])

head(e1.modf_civil)
##   bus med civ
## 1   1   0 0.0
## 2   0   0 0.0
## 3   2   1 0.0
## 4   0   1 0.5
## 5   0   0 0.0
## 6   0   1 0.5

Inner correlations in D1.2 and D1.3

corr_matrix_plt(d1.df[, c(2,3)])

Correlation Matrix of C1 with modified E1 and D1.2/3

corr_matrix_plt(cbind(c3.df, e1.modf_civil, d1.df[, c(2,3)]))

LR Model

Alternative 1

  c3
Predictors Estimates CI p
(Intercept) 0.76 -0.66 – 2.19 0.290
bus -0.40 -1.00 – 0.20 0.185
med 0.38 -0.35 – 1.10 0.307
civ 0.10 -2.64 – 2.85 0.940
motivation prob 0.23 ** 0.07 – 0.39 0.006
motivation welfare 0.40 *** 0.20 – 0.59 <0.001
civ * motivation welfare 0.07 -0.27 – 0.41 0.676
Observations 100
R2 / R2 adjusted 0.475 / 0.441
  • p<0.05   ** p<0.01   *** p<0.001

Alternative 2

  c3
Predictors Estimates CI p
(Intercept) 0.82 -0.56 – 2.20 0.241
civ 0.00 -2.69 – 2.69 0.997
motivation prob 0.22 ** 0.06 – 0.39 0.007
motivation welfare 0.41 *** 0.22 – 0.60 <0.001
civ * motivation welfare 0.08 -0.25 – 0.40 0.635
Observations 103
R2 / R2 adjusted 0.455 / 0.433
  • p<0.05   ** p<0.01   *** p<0.001

Mehrere Variablen sind fast unmöglich, Vorschlag?

Example 3: D1 | Problem: What are we modeling & ?Water is wet?

Hypothesis: The motivation to directly address a natural, technical economic or social problem or even to improve the human condition/welfare can be a strong component for social innovation, although it is not a pre-condition. The motivation to better understand a natural, technical, economic or social phenomenon, however, points to a rather “regular” scientific motivation, without directly aiming to problem solving or improving human/welfare conditions.

RE-formulation: Motivation for address a natural, technical economic or social problem or even to improve the human condition/welfare are more likely to produce to new or better services, products, processes, or ways of doing things compared to motivation to better understand a natural, technical, economic or social phenomenon.

We have 3 different variables in D1:

## [1] "motivation.pheno."   "motivation.prob."    "motivation.welfare."

How do the independent variable correlate against each other Should we handle the collinearity between the dependent variables? We want to combine the last 2 but the first 2 correlate better

Dependent variable G1: To what degree has your project directly contributed to new or better services, products, processes, or ways of doing things that were targeted towards …

head(g1.df)
##   impactTargetGroup.pub. impactTargetGroup.busi. impactTargetGroup.socgr.
## 1                      0                       0                        0
## 2                      0                       0                        0
## 3                      1                       5                        0
## 4                      0                       0                        0
## 5                      5                       0                        0
## 6                      2                       0                        3
##   impactTargetGroup.welfare. impactTargetGroup.civsoc.
## 1                          0                         0
## 2                          0                         0
## 3                          0                         0
## 4                          0                         0
## 5                          0                         0
## 6                          4                         0
##   impactTargetGroup.policy. impactTargetGroup.acad.
## 1                         0                      10
## 2                         0                       7
## 3                         0                       8
## 4                         0                      10
## 5                         0                      10
## 6                         2                       7

Combining g1’s all sub-variables other than acad. into 1 single variable for testing purposes

library(plotly)
barplot(table(g1.modf_civil))

Correlation between D1 variables and G1

d1.test.df <- as.data.frame(cbind(d1.df, 
                                  "civ.prod" = g1.modf_civil))

corr_matrix_plt(d1.test.df)

LM model

d1.test.lm <- lm(data = d1.test.df, civ.prod ~ motivation.pheno. + motivation.prob. + motivation.welfare. )

tab_model(d1.test.lm)
  civ.prod
Predictors Estimates CI p
(Intercept) 0.15 -0.47 – 0.78 0.629
motivation pheno 0.04 -0.04 – 0.11 0.368
motivation prob -0.01 -0.08 – 0.06 0.820
motivation welfare 0.31 0.25 – 0.36 <0.001
Observations 336
R2 / R2 adjusted 0.292 / 0.286

What do we found out: Social motivation produces products for society?

Example 4: E1 | Relatively straight forward

corr_matrix_plt(cbind(data$familiarWithSI.response., e1.df$groupsInvolved.res., e1.df_trans))

corr_matrix_plt(cbind(e1.df_trans, data$familiarWithSI.response., data$transdisciplinaryExp.rate., d1.df[, 2:3]))

e1.df_model <- data.frame(cbind(e1.df_trans, data$familiarWithSI.response., data$transdisciplinaryExp.rate., d1.df[, 2:3]))
e1.model <- lm(data=e1.df_model, e1.df_trans ~ data$familiarWithSI.response. + data$transdisciplinaryExp.rate. + d1.df[, 2] + d1.df[, 3])
summary(e1.model)
## 
## Call:
## lm(formula = e1.df_trans ~ data$familiarWithSI.response. + data$transdisciplinaryExp.rate. + 
##     d1.df[, 2] + d1.df[, 3], data = e1.df_model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.59525 -0.20926 -0.04866  0.14573  1.44388 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     -2.184e-02  4.996e-02  -0.437    0.662    
## data$familiarWithSI.response.    4.455e-02  6.367e-03   6.998 1.52e-11 ***
## data$transdisciplinaryExp.rate.  7.774e-03  6.489e-03   1.198    0.232    
## d1.df[, 2]                      -9.305e-05  5.647e-03  -0.016    0.987    
## d1.df[, 3]                       2.794e-02  5.683e-03   4.916 1.41e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3045 on 322 degrees of freedom
##   (34 observations deleted due to missingness)
## Multiple R-squared:  0.2936, Adjusted R-squared:  0.2848 
## F-statistic: 33.46 on 4 and 322 DF,  p-value: < 2.2e-16

REST




# why do I even have to do this
##knitr::opts_knit$set(root.dir = rprojroot::find_rstudio_root_file())

data <- read.csv("../01_data/results-survey718586.csv")

colnames(data)[142:ncol(data)] <- c("persID",
                                    "gender",
                                    "projID",
                                    "title",
                                    "discipline",
                                    "funding",
                                    "runtime"
                                    )

# Hypothesis 1:  Transdisciplinarity is usually an important 
# cornerstone for social innovation, although it is no condition 
# for it. Moreover, not all transdisciplinary research automatically 
# contributes to social innovation. Nevertheless, we assume that
# a higher experience in transdisciplinary research might point to
# a higher propensity for social innovation.


trans_exp <- data$transdisciplinaryExp.rate.
data$transdisexp <- ifelse(trans_exp <= 3, 1,
                      ifelse(trans_exp %in% 4:6, 2,
                        ifelse(trans_exp >= 7, 3, 99)))
  
data$groupsInvolved.citiz.
##   [1]  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  2  0  0  0  0  0  0  0
##  [26]  0  0  0  2  0  1  2  0  0  0  2  0  0  1  0  0  0 NA  0  0  2  0 NA  0  0
##  [51]  0  0  1  0  1  2  0  0  2  0  0  0  2  0  1  0  0  1  0  1  0  0  0  0  0
##  [76]  1  0  0  0  0  0  0  0  0  0  0  1  0  0  2  0  0  0  0  0  0  2  2  0  0
## [101]  0  0  0  0  0  2  1  2  0 NA  0  1  0  2  0  1  0  0  1  1  0  0  0  0  0
## [126]  0  0  2  0  0  0  0  0  0  0  0  0  1  0  0  0  1  0  0  1  0  0  0  0  0
## [151]  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  2  0  0  0
## [176]  0  0  0  0  0  0  1  0  1  0  0  2  0  0  0  0  2  0  2  0  0  0  0  0  1
## [201]  0 NA  0  0  0  1  0  1  0  0 NA  0  0  1  1  0  0  0  0  0  0  0  0  1  0
## [226]  0  0  1  1  1  1  1  0  2  0  0  2  0  0  1  0  2  0  0  0  0  1  0  2  1
## [251]  0  0  2  0 NA  2  2  0  1  0  1  1  0  0  0  0  0  0  2  0  0  1  0  0 NA
## [276]  0  0  2  0  0  0  1  0 NA  2  0  0  2  0  0  0  0  1  0  0  0  0  0  0  0
## [301]  0  1  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  2  0  0  0  0  2  0
## [326]  0  1  0  0  0  1  0  0  0  0  1  2  1  0  0  1  0  0  1  0  1  0  0  0  0
## [351]  0  0  0  0  0  0  2  2  1  0  0
lm(data = data, transdisexp ~ groupsInvolved.citiz.)
## 
## Call:
## lm(formula = transdisexp ~ groupsInvolved.citiz., data = data)
## 
## Coefficients:
##           (Intercept)  groupsInvolved.citiz.  
##                 2.085                  0.344
str(data)
## 'data.frame':    361 obs. of  149 variables:
##  $ id                             : int  2 3 4 5 6 7 8 10 11 12 ...
##  $ submitdate                     : chr  "2021-09-28 02:56:57" "2021-09-27 06:21:56" "2021-09-27 06:29:16" "2021-09-27 06:30:39" ...
##  $ lastpage                       : int  9 9 9 9 9 9 9 9 9 9 ...
##  $ startlanguage                  : chr  "en" "en" "en" "en" ...
##  $ seed                           : int  1076101691 744509311 343728637 732155726 1256303658 1857465652 1981445861 113241726 1459574906 1794364104 ...
##  $ token                          : chr  "jqhNrCMLanMcVsA" "I32cwAzs6xYoGXI" "hp0jXAR3PcNHDMB" "vevhlz7zAMImcau" ...
##  $ startdate                      : chr  "2021-09-27 06:03:26" "2021-09-27 06:14:57" "2021-09-27 06:20:26" "2021-09-27 06:23:02" ...
##  $ datestamp                      : chr  "2021-09-28 02:56:57" "2021-09-27 06:21:56" "2021-09-27 06:29:16" "2021-09-27 06:30:39" ...
##  $ ipaddr                         : chr  "114.73.153.25" "2a02:aa16:3881:580:9992:dd1f:3222:e418" "129.132.90.13" "188.216.40.141" ...
##  $ refurl                         : chr  "" "" "" "" ...
##  $ transdisciplinaryExp.rate.     : int  6 10 7 5 7 4 7 10 NA 8 ...
##  $ age                            : chr  "40ies" "50ies" "50ies" "50ies" ...
##  $ academicAge                    : chr  "e" "e" "e" "e" ...
##  $ familiarWithSI.response.       : int  8 8 1 4 0 5 5 0 4 7 ...
##  $ projectReference               : chr  "Y" "Y" "Y" "Y" ...
##  $ projectReferenceNo             : logi  NA NA NA NA NA NA ...
##  $ contribToSI.rate.              : int  0 2 NA NA NA 4 2 NA NA 3 ...
##  $ motivation.pheno.              : int  8 10 9 10 7 8 2 10 4 10 ...
##  $ motivation.prob.               : int  8 2 9 5 7 9 2 10 4 7 ...
##  $ motivation.welfare.            : int  8 3 5 0 10 3 2 0 10 7 ...
##  $ benefitForNonAcademy           : int  2 0 1 0 0 1 1 0 1 1 ...
##  $ impulseForNonAcad.soc.         : chr  "" "" "" "" ...
##  $ impulseForNonAcad.econ.        : chr  "" "" "" "" ...
##  $ impulseForNonAcad.ecol.        : chr  "" "" "" "" ...
##  $ impulseForNonAcad.health.      : chr  "" "Y" "" "" ...
##  $ impulseForNonAcad.tech.        : chr  "Y" "" "" "" ...
##  $ impulseForNonAcad.other.       : chr  "" "" "curiosity" "scientific couriosity" ...
##  $ groupsInvolved.res.            : int  2 2 2 2 1 1 0 2 1 1 ...
##  $ groupsInvolved.busi.           : int  1 0 2 0 0 0 0 0 0 1 ...
##  $ groupsInvolved.civsoc.         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ groupsInvolved.policy.         : int  0 0 0 0 0 1 0 0 0 0 ...
##  $ groupsInvolved.citiz.          : int  0 0 0 1 0 0 0 0 0 0 ...
##  $ groupsInvolved.media.          : int  0 0 1 1 0 1 0 0 0 1 ...
##  $ groupsInvolved.welfare.        : int  0 0 0 1 0 1 0 0 0 0 ...
##  $ natureOfInvolvement.res.       : chr  "colla" "colla" "colla" "colla" ...
##  $ natureOfInvolvement.busi.      : chr  "cons" "" "colla" "" ...
##  $ natureOfInvolvement.civsoc.    : chr  "" "" "" "" ...
##  $ natureOfInvolvement.policy.    : chr  "" "" "" "" ...
##  $ natureOfInvolvement.citiz.     : chr  "" "" "" "cons" ...
##  $ natureOfInvolvement.media.     : chr  "" "" "cons" "cons" ...
##  $ natureOfInvolvement.welfare.   : chr  "" "" "" "cons" ...
##  $ targetGroupsGoals.socneeds.    : int  0 NA 0 NA NA NA 0 NA 1 0 ...
##  $ targetGroupsGoals.socgroups.   : int  0 NA 0 NA NA NA 0 NA 0 0 ...
##  $ targetGroupsGoals.improve.     : int  1 NA 1 NA NA 1 0 NA 1 0 ...
##  $ targetGroupsGoals.empower.     : int  0 NA 1 NA NA NA 0 NA 0 0 ...
##  $ targetGroupsGoals.diversity.   : int  0 NA 1 NA NA NA 1 NA 0 0 ...
##  $ concepts.pub.                  : int  1 1 1 1 1 1 1 1 1 0 ...
##  $ concepts.data.                 : int  0 1 1 1 1 NA 1 1 1 0 ...
##  $ concepts.code.                 : int  0 1 0 0 1 1 0 0 0 1 ...
##  $ concepts.infra.                : int  0 1 0 0 1 0 0 0 0 1 ...
##  $ concepts.review.               : int  0 1 0 1 0 0 0 0 0 0 ...
##  $ concepts2                      : chr  "N" "N" "N" "N" ...
##  $ concepts3                      : chr  "N" "N" "N" "N" ...
##  $ impactTargetGroup.pub.         : int  0 0 1 0 5 2 2 0 4 6 ...
##  $ impactTargetGroup.busi.        : int  0 0 5 0 0 0 2 0 5 7 ...
##  $ impactTargetGroup.socgr.       : int  0 0 0 0 0 3 1 0 2 0 ...
##  $ impactTargetGroup.welfare.     : int  0 0 0 0 0 4 1 0 4 0 ...
##  $ impactTargetGroup.civsoc.      : int  0 0 0 0 0 0 1 0 1 0 ...
##  $ impactTargetGroup.policy.      : int  0 0 0 0 0 2 1 0 1 0 ...
##  $ impactTargetGroup.acad.        : int  10 7 8 10 10 7 8 0 6 9 ...
##  $ kindOfChange.pub.              : chr  "" "" "" "" ...
##  $ kindOfChange.busi.             : chr  "" "" "und" "" ...
##  $ kindOfChange.socgr.            : chr  "" "" "" "" ...
##  $ kindOfChange.welfare.          : chr  "" "" "" "" ...
##  $ kindOfChange.civsoc.           : chr  "" "" "" "" ...
##  $ kindOfChange.policy.           : chr  "" "" "" "" ...
##  $ kindOfChange.acad.             : chr  "und" "att" "und" "und" ...
##  $ kindOfChangeOther              : chr  "" "" "" "" ...
##  $ adoptByPolicy.rate.            : int  0 0 0 0 0 1 0 0 0 0 ...
##  $ adoptByPolicyHow.SQ001.        : chr  "" "" "" "" ...
##  $ adoptByPolicyHow.SQ002.        : chr  "" "" "" "" ...
##  $ adoptByPolicyHow.SQ003.        : chr  "" "" "" "" ...
##  $ adoptByPolicyHow.other.        : chr  "" "" "" "" ...
##  $ Impactstatements.capab.        : int  99 0 0 0 99 NA 0 0 5 0 ...
##  $ Impactstatements.emanc.        : int  99 0 0 0 99 NA 0 0 2 0 ...
##  $ Impactstatements.understanding.: int  99 0 0 0 99 NA 0 0 1 0 ...
##  $ Impactstatements.mitig.        : int  99 0 0 0 99 NA 0 0 2 0 ...
##  $ Impactstatements.unknown.      : int  5 0 5 0 99 NA 0 0 7 0 ...
##  $ Impactstatements.unaddressed.  : int  10 6 5 0 7 NA 8 5 4 0 ...
##  $ dissChannels.peer.             : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ dissChannels.mono.             : int  0 0 0 1 0 0 0 0 0 0 ...
##  $ dissChannels.conf.             : int  1 1 1 1 0 1 1 0 1 1 ...
##  $ dissChannels.policy.           : int  0 0 0 0 0 1 0 0 0 0 ...
##  $ dissChannels.trad.             : int  1 0 1 1 0 0 0 0 0 0 ...
##  $ dissChannels.prof.             : int  1 0 1 1 0 1 0 0 0 0 ...
##  $ dissChannels.web.              : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ dissChannels.socmed.           : int  1 0 1 1 1 0 0 0 0 0 ...
##  $ dissChannels.platf.            : int  0 0 1 0 0 NA 0 0 0 0 ...
##  $ dissChannels.consult.          : int  0 1 1 0 0 NA 0 0 0 0 ...
##  $ dissChannels.events.           : int  0 0 0 1 0 1 0 0 0 0 ...
##  $ dissChannels.public.           : int  1 1 0 0 1 0 0 0 0 0 ...
##  $ scalabilityRating.up.          : int  NA NA 8 NA 99 7 NA NA 7 7 ...
##  $ scalabilityRating.out.         : int  NA NA 10 NA 99 6 NA NA 4 0 ...
##  $ scalabilityRating.deep.        : int  NA NA 8 NA 99 6 NA NA 3 0 ...
##  $ interestInSummary              : int  1 1 0 1 1 0 1 0 0 1 ...
##  $ SIthroughSNSF                  : chr  "" "This should not be done. SNSF should be about science, not political meddling. The contribution of science to s"| __truncated__ "" "" ...
##  $ resultsHighlight               : chr  "The project had delivered a new molecular concepts to achieve deep blue emitters for organic light emitting dev"| __truncated__ "" "" "" ...
##  $ feedback                       : chr  "Although the primary project design was not to develop sensors for monitoring long COVID patients, one of the u"| __truncated__ "" "" "" ...
##  $ thankYou                       : logi  NA NA NA NA NA NA ...
##   [list output truncated]
ggplot(data, aes(x = transdisexp, y = groupsInvolved.citiz.)) + 
  geom_point() +
  stat_smooth(method = "lm") +
  geom_jitter()

data$groupsInvolved.res.
##   [1] 2 2 2 2 1 1 0 2 1 1 1 0 2 0 2 1 2 2 2 1 2 0 1 1 0 0 0 1 2 2 1 1 1 1 2 2 1
##  [38] 2 2 2 1 1 1 1 0 2 1 1 1 0 1 0 2 1 2 1 2 2 2 0 0 2 2 2 1 1 1 2 0 1 1 1 1 0
##  [75] 1 1 1 0 2 0 2 0 1 1 2 0 2 1 0 2 1 0 1 1 2 0 1 2 0 1 1 2 2 2 1 2 2 2 0 2 2
## [112] 2 1 2 1 2 2 1 0 1 2 1 0 1 2 0 2 2 2 0 1 1 0 0 0 1 1 1 2 2 2 2 1 2 2 2 2 2
## [149] 0 0 2 1 2 2 0 1 2 2 0 0 1 1 0 2 1 0 0 2 0 2 1 1 2 0 2 0 1 1 2 2 2 2 2 1 1
## [186] 1 2 0 0 0 0 0 0 1 0 1 1 0 0 2 0 2 1 2 1 2 1 2 2 1 1 1 2 1 1 2 1 1 2 1 0 1
## [223] 2 2 2 2 2 1 2 1 2 2 1 1 1 2 2 0 1 2 1 2 2 0 1 2 1 0 2 0 2 0 2 2 2 1 2 0 2
## [260] 1 2 1 0 1 1 0 1 2 0 1 0 1 1 1 1 2 0 1 1 0 0 1 0 1 2 2 1 2 0 0 1 2 2 0 0 2
## [297] 2 2 1 1 0 2 2 2 1 1 2 0 2 1 2 2 1 1 0 1 2 2 2 1 2 0 1 2 1 0 2 1 1 2 1 2 0
## [334] 1 2 2 2 2 2 1 2 0 0 1 2 1 1 2 1 0 2 0 1 2 2 2 2 1 1 2 2
ggplot(data, aes(x = transdisexp, y = groupsInvolved.res.)) + 
  geom_point() +
  stat_smooth(method = "lm") +
  geom_jitter()

data$groupsInvolved.civsoc.
##   [1]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  2  0  0  0  0  0  0  0
##  [26]  0  0  0  2  0  0 NA  0  0  0  2  0  0  0  0  0  0  2  0  0  1  0  1  0  0
##  [51]  0  0  0  0  0  0  0  0  1  0  0  0  1  0  0  0  0  0  0  1  0  0  0  0  1
##  [76]  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  2  0  0  0  2  0  0  0
## [101]  0 NA  0  0  0  0  1  0  1 NA  0  2  0  0  0  0  0  0 NA  2  0  1  0  0  0
## [126]  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  1  1  0  0  0  0  0
## [151]  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0
## [176]  0  0  0  0  0  0  1  0  0  1  0  0  0  0  0  0  2  0  0  0  0  0  0  0  0
## [201]  0  0  0  0  0  0  0  0  0  0 NA  2  0  0  0  0  0  2  0  0  0  0  0  1  0
## [226]  0  0  0  2  0  0  1  0  0  1  0  0  1  0  0  0  1  0  0  0  0  0  0  2  0
## [251]  0  0  1  1 NA  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 NA
## [276]  0  0  1  0  0  0  1  0  0  0  0  0  1  0  0  0  1  0  0  0  0  0  0  0  0
## [301]  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  1  0  0  2  0
## [326]  0  0  0  0  0  0  0  0  0  0  0  2  0  0  0  1  0  0  1  0  0  0  0  0  0
## [351]  0  0  0  0  1  0  0  2  0  0  0
a <- lm(data=data, transdisexp ~ groupsInvolved.citiz. + groupsInvolved.res. + groupsInvolved.citiz.:groupsInvolved.res.)
b <- lm(data=data, transdisexp ~ groupsInvolved.citiz. + groupsInvolved.res.)

c <- lm(data=data, transdisexp ~ groupsInvolved.citiz. + groupsInvolved.res. + groupsInvolved.civsoc. + groupsInvolved.citiz.:groupsInvolved.res.)

summary(a)
## 
## Call:
## lm(formula = transdisexp ~ groupsInvolved.citiz. + groupsInvolved.res. + 
##     groupsInvolved.citiz.:groupsInvolved.res., data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6645 -0.6476  0.1252  0.5457  1.3523 
## 
## Coefficients:
##                                           Estimate Std. Error t value Pr(>|t|)
## (Intercept)                                1.64765    0.07852  20.985  < 2e-16
## groupsInvolved.citiz.                      0.24779    0.16409   1.510    0.132
## groupsInvolved.res.                        0.40331    0.05837   6.910 2.39e-11
## groupsInvolved.citiz.:groupsInvolved.res. -0.01877    0.09849  -0.191    0.849
##                                              
## (Intercept)                               ***
## groupsInvolved.citiz.                        
## groupsInvolved.res.                       ***
## groupsInvolved.citiz.:groupsInvolved.res.    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7542 on 341 degrees of freedom
##   (16 observations deleted due to missingness)
## Multiple R-squared:  0.1998, Adjusted R-squared:  0.1927 
## F-statistic: 28.38 on 3 and 341 DF,  p-value: < 2.2e-16
summary(b)
## 
## Call:
## lm(formula = transdisexp ~ groupsInvolved.citiz. + groupsInvolved.res., 
##     data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6697 -0.6523  0.1113  0.5493  1.3477 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            1.65227    0.07458  22.154  < 2e-16 ***
## groupsInvolved.citiz.  0.21900    0.06407   3.418 0.000707 ***
## groupsInvolved.res.    0.39919    0.05415   7.372 1.28e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7531 on 342 degrees of freedom
##   (16 observations deleted due to missingness)
## Multiple R-squared:  0.1997, Adjusted R-squared:  0.195 
## F-statistic: 42.66 on 2 and 342 DF,  p-value: < 2.2e-16
summary(c)
## 
## Call:
## lm(formula = transdisexp ~ groupsInvolved.citiz. + groupsInvolved.res. + 
##     groupsInvolved.civsoc. + groupsInvolved.citiz.:groupsInvolved.res., 
##     data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6306 -0.6440  0.1049  0.5565  1.3560 
## 
## Coefficients:
##                                           Estimate Std. Error t value Pr(>|t|)
## (Intercept)                                1.64403    0.07900  20.809  < 2e-16
## groupsInvolved.citiz.                      0.23911    0.17343   1.379    0.169
## groupsInvolved.res.                        0.39972    0.05876   6.803 4.69e-11
## groupsInvolved.civsoc.                     0.07735    0.09559   0.809    0.419
## groupsInvolved.citiz.:groupsInvolved.res. -0.02597    0.10135  -0.256    0.798
##                                              
## (Intercept)                               ***
## groupsInvolved.citiz.                        
## groupsInvolved.res.                       ***
## groupsInvolved.civsoc.                       
## groupsInvolved.citiz.:groupsInvolved.res.    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7568 on 337 degrees of freedom
##   (19 observations deleted due to missingness)
## Multiple R-squared:  0.2013, Adjusted R-squared:  0.1919 
## F-statistic: 21.24 on 4 and 337 DF,  p-value: 1.233e-15
e1.colnames <- colnames(data)[grep("groupsInvolved\\.", colnames(data))]

# Corr. Matrix of all A1-E1
chart.Correlation(data[, c("transdisexp",e1.colnames)], histogram=TRUE)

Introduction

The Tufte handout style is a style that Edward Tufte uses in his books and handouts. Tufte’s style is known for its extensive use of sidenotes, tight integration of graphics with text, and well-set typography. This style has been implemented in LaTeX and HTML/CSS1 See Github repositories tufte-latex and tufte-css, respectively. We have ported both implementations into the tufte package. If you want LaTeX/PDF output, you may use the tufte_handout format for handouts, and tufte_book for books. For HTML output, use tufte_html. These formats can be either specified in the YAML metadata at the beginning of an R Markdown document (see an example below), or passed to the rmarkdown::render() function. See Allaire et al. (2021)Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2021. Rmarkdown: Dynamic Documents for r. https://CRAN.R-project.org/package=rmarkdown. for more information about rmarkdown.

---
title: "An Example Using the Tufte Style"
author: "John Smith"
output:
  tufte::tufte_handout: default
  tufte::tufte_html: default
---

There are two goals of this package:

  1. To produce both PDF and HTML output with similar styles from the same R Markdown document;
  2. To provide simple syntax to write elements of the Tufte style such as side notes and margin figures, e.g. when you want a margin figure, all you need to do is the chunk option fig.margin = TRUE, and we will take care of the details for you, so you never need to think about \begin{marginfigure} \end{marginfigure} or <span class="marginfigure"> </span>; the LaTeX and HTML code under the hood may be complicated, but you never need to learn or write such code.

If you have any feature requests or find bugs in tufte, please do not hesitate to file them to https://github.com/rstudio/tufte/issues. For general questions, you may ask them on StackOverflow: https://stackoverflow.com/tags/rmarkdown.

Headings

This style provides first and second-level headings (that is, # and ##), demonstrated in the next section. You may get unexpected output if you try to use ### and smaller headings.

In his later books2 Beautiful Evidence, Tufte starts each section with a bit of vertical space, a non-indented paragraph, and sets the first few words of the sentence in small caps. To accomplish this using this style, call the newthought() function in tufte in an inline R expression `r ` as demonstrated at the beginning of this paragraph.3 Note you should not assume tufte has been attached to your R session. You should either library(tufte) in your R Markdown document before you call newthought(), or use tufte::newthought().

Figures

Margin Figures

Images and graphics play an integral role in Tufte’s work. To place figures in the margin you can use the knitr chunk option fig.margin = TRUE. For example:

MPG vs horsepower, colored by transmission. MPG vs horsepower, colored by transmission.

library(ggplot2)
mtcars2 <- mtcars
mtcars2$am <- factor(
  mtcars$am, labels = c('automatic', 'manual')
)
ggplot(mtcars2, aes(hp, mpg, color = am)) +
  geom_point() + geom_smooth() +
  theme(legend.position = 'bottom')

Note the use of the fig.cap chunk option to provide a figure caption. You can adjust the proportions of figures using the fig.width and fig.height chunk options. These are specified in inches, and will be automatically scaled down to fit within the handout margin.

Arbitrary Margin Content

In fact, you can include anything in the margin using the knitr engine named marginfigure. Unlike R code chunks ```{r}, you write a chunk starting with ```{marginfigure} instead, then put the content in the chunk. See an example on the right about the first fundamental theorem of calculus.

We know from the first fundamental theorem of calculus that for \(x\) in \([a, b]\): \[\frac{d}{dx}\left( \int_{a}^{x} f(u)\,du\right)=f(x).\]

For the sake of portability between LaTeX and HTML, you should keep the margin content as simple as possible (syntax-wise) in the marginefigure blocks. You may use simple Markdown syntax like **bold** and _italic_ text, but please refrain from using footnotes, citations, or block-level elements (e.g. blockquotes and lists) there.

Note: if you set echo = FALSE in your global chunk options, you will have to add echo = TRUE to the chunk to display a margin figure, for example ```{marginfigure, echo = TRUE}.

Full Width Figures

You can arrange for figures to span across the entire page by using the chunk option fig.fullwidth = TRUE.

ggplot(diamonds, aes(carat, price)) + geom_smooth() +
  facet_grid(~ cut)
A full width figure.

A full width figure.

Other chunk options related to figures can still be used, such as fig.width, fig.cap, out.width, and so on. For full width figures, usually fig.width is large and fig.height is small. In the above example, the plot size is \(10 \times 2\).

Arbitrary Full Width Content

Any content can span to the full width of the page. This feature requires Pandoc 2.0 or above. All you need is to put your content in a fenced Div with the class fullwidth, e.g.,

::: {.fullwidth}
Any _full width_ content here.
:::

Below is an example:

R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under the terms of the GNU General Public License versions 2 or 3. For more information about these matters see https://www.gnu.org/licenses/.

Main Column Figures

Besides margin and full width figures, you can of course also include figures constrained to the main column. This is the default type of figures in the LaTeX/HTML output.

ggplot(diamonds, aes(cut, price)) + geom_boxplot()

A figure in the main column.

A figure in the main column.

Sidenotes

One of the most prominent and distinctive features of this style is the extensive use of sidenotes. There is a wide margin to provide ample room for sidenotes and small figures. Any use of a footnote will automatically be converted to a sidenote.4 This is a sidenote that was entered using a footnote.

If you’d like to place ancillary information in the margin without the sidenote mark (the superscript number), you can use the margin_note() function from tufte in an inline R expression. This is a margin note. Notice that there is no number preceding the note. This function does not process the text with Pandoc, so Markdown syntax will not work here. If you need to write anything in Markdown syntax, please use the marginfigure block described previously.

References

References can be displayed as margin notes for HTML output. For example, we can cite R here (R Core Team 2021R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.). To enable this feature, you must set link-citations: yes in the YAML metadata, and the version of pandoc-citeproc should be at least 0.7.2. You can always install your own version of Pandoc from https://pandoc.org/installing.html if the version is not sufficient. To check the version of pandoc-citeproc in your system, you may run this in R:

system2('pandoc-citeproc', '--version')

If your version of pandoc-citeproc is too low, or you did not set link-citations: yes in YAML, references in the HTML output will be placed at the end of the output document.

Tables

You can use the kable() function from the knitr package to format tables that integrate well with the rest of the Tufte handout style. The table captions are placed in the margin like figures in the HTML output.

knitr::kable(
  mtcars[1:6, 1:6], caption = 'A subset of mtcars.'
)

A subset of mtcars.

mpg cyl disp hp drat wt
Mazda RX4 21.0 6 160 110 3.90 2.620
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875
Datsun 710 22.8 4 108 93 3.85 2.320
Hornet 4 Drive 21.4 6 258 110 3.08 3.215
Hornet Sportabout 18.7 8 360 175 3.15 3.440
Valiant 18.1 6 225 105 2.76 3.460

Block Quotes

We know from the Markdown syntax that paragraphs that start with > are converted to block quotes. If you want to add a right-aligned footer for the quote, you may use the function quote_footer() from tufte in an inline R expression. Here is an example:

“If it weren’t for my lawyer, I’d still be in prison. It went a lot faster with two people digging.”

— Joe Martin

Without using quote_footer(), it looks like this (the second line is just a normal paragraph):

“Great people talk about ideas, average people talk about things, and small people talk about wine.”

— Fran Lebowitz

Responsiveness

The HTML page is responsive in the sense that when the page width is smaller than 760px, sidenotes and margin notes will be hidden by default. For sidenotes, you can click their numbers (the superscripts) to toggle their visibility. For margin notes, you may click the circled plus signs to toggle visibility.

More Examples

The rest of this document consists of a few test cases to make sure everything still works well in slightly more complicated scenarios. First we generate two plots in one figure environment with the chunk option fig.show = 'hold':

p <- ggplot(mtcars2, aes(hp, mpg, color = am)) +
  geom_point()
p
p + geom_smooth()

Two plots in one figure environment.

Two plots in one figure environment.Two plots in one figure environment.

Then two plots in separate figure environments (the code is identical to the previous code chunk, but the chunk option is the default fig.show = 'asis' now):

p <- ggplot(mtcars2, aes(hp, mpg, color = am)) +
  geom_point()
p

Two plots in separate figure environments (the first plot).

Two plots in separate figure environments (the first plot).
p + geom_smooth()

Two plots in separate figure environments (the second plot).

Two plots in separate figure environments (the second plot).

You may have noticed that the two figures have different captions, and that is because we used a character vector of length 2 for the chunk option fig.cap (something like fig.cap = c('first plot', 'second plot')).

Next we show multiple plots in margin figures. Similarly, two plots in the same figure environment in the margin:

Two plots in one figure environment in the margin.Two plots in one figure environment in the margin. Two plots in one figure environment in the margin.

p
p + geom_smooth(method = 'lm')

Then two plots from the same code chunk placed in different figure environments:

knitr::kable(head(iris, 15))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
5.4 3.7 1.5 0.2 setosa
4.8 3.4 1.6 0.2 setosa
4.8 3.0 1.4 0.1 setosa
4.3 3.0 1.1 0.1 setosa
5.8 4.0 1.2 0.2 setosa

Two plots in separate figure environments in the margin (the first plot). Two plots in separate figure environments in the margin (the first plot).

p
knitr::kable(head(iris, 12))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa
5.4 3.7 1.5 0.2 setosa
4.8 3.4 1.6 0.2 setosa

Two plots in separate figure environments in the margin (the second plot). Two plots in separate figure environments in the margin (the second plot).

p + geom_smooth(method = 'lm')
knitr::kable(head(iris, 5))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa

We blended some tables in the above code chunk only as placeholders to make sure there is enough vertical space among the margin figures, otherwise they will be stacked tightly together. For a practical document, you should not insert too many margin figures consecutively and make the margin crowded.

You do not have to assign captions to figures. We show three figures with no captions below in the margin, in the main column, and in full width, respectively.

# a boxplot of weight vs transmission; this figure
# will be placed in the margin
ggplot(mtcars2, aes(am, wt)) + geom_boxplot() +
  coord_flip()
# a figure in the main column
p <- ggplot(mtcars, aes(wt, hp)) + geom_point()
p

# a fullwidth figure
p + geom_smooth(method = 'lm') + facet_grid(~ gear)

Some Notes on Tufte CSS

There are a few other things in Tufte CSS that we have not mentioned so far. If you prefer sans-serif fonts, use the function sans_serif() in tufte. For epigraphs, you may use a pair of underscores to make the paragraph italic in a block quote, e.g.

I can win an argument on any topic, against any opponent. People know this, and steer clear of me at parties. Often, as a sign of their great respect, they don’t even invite me.

— Dave Barry

We hope you will enjoy the simplicity of R Markdown and this R package, and we sincerely thank the authors of the Tufte-CSS and Tufte-LaTeX projects for developing the beautiful CSS and LaTeX classes. Our tufte package would not have been possible without their heavy lifting.

You can turn on/off some features of the Tufte style in HTML output. The default features enabled are:

output:
  tufte::tufte_html:
    tufte_features: ["fonts", "background", "italics"]

If you do not want the page background to be lightyellow, you can remove background from tufte_features. You can also customize the style of the HTML page via a CSS file. For example, if you do not want the subtitle to be italic, you can define

h3.subtitle em {
  font-style: normal;
}

in, say, a CSS file my_style.css (under the same directory of your Rmd document), and apply it to your HTML output via the css option, e.g.,

output:
  tufte::tufte_html:
    tufte_features: ["fonts", "background"]
    css: "my_style.css"

There is also a variant of the Tufte style in HTML/CSS named “Envisoned CSS.” This style can be used by specifying the argument tufte_variant = 'envisioned' in tufte_html()5 The actual Envisioned CSS was not used in the tufte package. We only changed the fonts, background color, and text color based on the default Tufte style., e.g.

output:
  tufte::tufte_html:
    tufte_variant: "envisioned"

To see the R Markdown source of this example document, you may follow this link to Github, use the wizard in RStudio IDE (File -> New File -> R Markdown -> From Template), or open the Rmd file in the package:

file.edit(
  tufte:::template_resources(
    'tufte_html', '..', 'skeleton', 'skeleton.Rmd'
  )
)

This document is also available in Chinese, and its envisioned style can be found here.